feat(pandas): Flopy pandas support#1955
feat(pandas): Flopy pandas support#1955spaulins-usgs merged 1 commit intomodflowpy:developfrom scottrp:develop
Conversation
Codecov Report
@@ Coverage Diff @@
## develop #1955 +/- ##
=========================================
- Coverage 72.6% 71.7% -1.0%
=========================================
Files 257 258 +1
Lines 57800 57412 -388
=========================================
- Hits 42017 41179 -838
- Misses 15783 16233 +450
|
|
linking back to comment on original PR, sorry again for the accidental close |
| """ | ||
| self._set_data(data, check_data=check_data) | ||
|
|
||
| def set_record(self, data_record, autofill=False, check_data=True): |
There was a problem hiding this comment.
should this be "set_data_record" to be consistent with the variable that it is setting? Or set_control_record?
There was a problem hiding this comment.
Is the hidden and exposed method necessary for this? set_record() is only calling _set_record()
There was a problem hiding this comment.
Variable name changed to be more consistent with method name. Changed variable name instead of method name since method name change would breaking existing interface.
Having the hidden method is not necessary. I was originally doing this to make sure the correct method (parent vs child class) got called. But it is better to just explicitly define this, which I am now doing.
| """ | ||
| self._set_record(data_record, autofill, check_data) | ||
|
|
||
| def _set_record(self, data_record, autofill=False, check_data=True): |
There was a problem hiding this comment.
should this be _set_data_record to be consistent with the variable it is setting?
There was a problem hiding this comment.
Renamed variables to be consistent with method name
| self._resync() | ||
| try: | ||
| # convert to tuple | ||
| tuple_record = () |
There was a problem hiding this comment.
I'm not sure I understand why lists are being converted to tuples here. It looks like there is support for lists in .append_data.
There was a problem hiding this comment.
And if this is a single list record, could this just be tuple(record).
There was a problem hiding this comment.
Removed the conversion code and now passing list instead of tuple.
| ex, | ||
| ) | ||
|
|
||
| def update_record(self, record, key_index): |
There was a problem hiding this comment.
this would be clearer if key_index was kper or stress_period, etc...
There was a problem hiding this comment.
This interface is also used for packages like Time-Array Series whose "TIME" block (BEGIN TIME <tas_time>) has a key that is not a stress period. Similarly, the Observation package "CONTINUOUS" block (BEGIN CONTINUOUS FILEOUT <obs_output_file_name>) has a key that is a file name. I therefore choose the generalized name "key_index".
| message = ( | ||
| f"ERROR: Data list {self._data_name} supplied the " | ||
| f"wrong number of columns of data, expected " | ||
| f"{len(self._data_item_names)} got {len(data[0])}." | ||
| ) | ||
| type_, value_, traceback_ = sys.exc_info() | ||
| raise MFDataException( | ||
| self._data_dimensions.structure.get_model(), | ||
| self._data_dimensions.structure.get_package(), | ||
| self._data_dimensions.structure.path, | ||
| "setting list data", | ||
| self._data_dimensions.structure.name, | ||
| inspect.stack()[0][3], | ||
| type_, | ||
| value_, | ||
| traceback_, | ||
| message, | ||
| self._simulation_data.debug, | ||
| ) |
There was a problem hiding this comment.
It seems like we could either return the name of the missing column (or extra column) here to provide a more detailed error message.
There was a problem hiding this comment.
I can not easily determine the name of the missing column since this code accepts Pandas Dataframes with incorrect column names (it corrects the column names below, which is trivial to do given that the Dataframe has the correct number of columns). I did however make a more detailed error message that lists the data column names supplied and expected.
| if isinstance(dataset_one, mfdataplist.MFPandasList) or isinstance( | ||
| dataset_one, mfdataplist.MFPandasTransientList | ||
| ): |
There was a problem hiding this comment.
This could be simplified to isinstance(dataset_one, (mfdataplist.MFPandasList, mfdataplist.MFPandasTransientList))
| assert isinstance( | ||
| dataset, mfdataplist.MFPandasList | ||
| ) or isinstance(dataset, mfdataplist.MFPandasTransientList) |
There was a problem hiding this comment.
isinstance simplification here too
| write_headers=True, | ||
| lazy_io=False, | ||
| use_pandas=True, |
There was a problem hiding this comment.
What does write_headers write?
There was a problem hiding this comment.
write_headers: bool
When true flopy writes a header to each package file indicating that
it was created by flopy.
| elif isinstance( | ||
| value, mfdatalist.MFTransientList | ||
| ) or isinstance(value, mfdataplist.MFPandasTransientList): |
There was a problem hiding this comment.
The isinstance statement can be simplified `isinstance(value, (mfdatalist.MFTransientList, mfdataplist.MFPandasTransientList))
| elif isinstance(value, mfdatalist.MFList) or isinstance( | ||
| value, mfdataplist.MFPandasList |
There was a problem hiding this comment.
Same isinstance comment.
This PR is the first step towards integrating Pandas into Flopy. This integration takes place in the MFPandasList and MFPandasTransientList classes (MFPandas*), which are used instead of the MFList and MFTransientList classes under the following conditions:
The MFPandas* classes currently support the same interface as MFList and MFTransientList, and should behave similarly to the end-user. However, MFPandas* stores data internally in a Pandas Dataframe and reads and writes data using Pandas “read_csv” and “to_csv” methods, which can be significantly faster than flopy’s current file reading. The MFPandas* classes set_data methods support DataFrames and their new “get_dataframe” method returns data in a Panda’s Dataframe (“get_data” still returns a recarray).
Remaining work on this PR includes: